Preprocessing for Online Handwriting Recognition

نویسندگان

  • STEPHEN MCINERNEY
  • RICHARD B. REILLY
چکیده

Traditionally Online Handwriting Recognition (OHR) implementations use general-purpose processor architectures. The pre-processing step of OHR comprises regular array-based tasks such as normalisation, feature extraction and segmentation. Standard processor architectures cannot however efficiently support the varied arithmetic operations required by pre-processing. These tasks would seem ideally suited for custom hardware acceleration. CORDIC offers all the required elementary functions for pre-processing but is inefficient for linear mode operations (multiplication/division) due to its serial nature. A hybrid Multiplier/CORDIC architecture is proposed in which a fast iterative multiplier/MAC shares hardware with a serial CORDIC unit. This multiplier retires 6b/cycle with minor additional hardware requirements. This hybrid is shown to offer greatly improved performance for signal-processing applications compared to standard fixedpoint (FXP) architectures. Performance results are included demonstrating significant improvements on the pre-processing task of OHR. 1. ONLINE HANDWRITING RECOGNITION The growth in use of Personal Digital Assistants generates a need for accurate OHR under significant constraints of processor die area, power and memory. The OHR task itself involves pre-processing, character recognition and optional postprocessing steps. The pre-processing step involves three processes: normalisation, feature extraction and segmentation [1]. Pre-processing is typically carried out in software, but by implementing it in hardware, potential resources on a PDA are released for other applications. Normalisation aims to correct unwanted variations in the input, with typical processes include: rotation, scaling, shear transform, deskewing, extrema location, centre-of-mass location, zone classification, smoothing, threshold-based anomaly exclusion, resampling in the time and/or spatial domain. Feature extraction gives the recognizer its expected inputs in such formats as Euclidean distance, velocity or acceleration (requiring differentiation and integration), stroke or interstroke direction expressed either as slope, angle, cosine, sine or curvature, and measure of stroke curviness (stick, arc, curve). Segmentation entails classifying a sequence of sample points into strokes and characters, and relating them by order, connectivity and significance. Operations typically required by these pre-processing tasks [2]-[5][Dolfing] are listed below. ! Multiplier/Accumulator (MAC)-based operations: differentiation/integration, filtering, resampling, scaling, centre-of-mass location ! elementary functions sqrt, sin,cos, tan, sinh,cosh, exp,ln ! elementary transformations: rotation, shear transformation ! compound operations e.g. curvature, Hough transform ! comparison-based operations: thresholding, extrema finding, intersection checking However, these operations map to general-purpose architectures with varying levels of efficiency. This paper investigates implementations with a broad range of elementary function capabilities targeted at OHR preprocessing. 2. SYSTEM DESIGN A simple but representative OHR task was selected, namely recognition of isolated handwritten digits. Test and training datasets of 7494 digit samples in the UNIPEN format were used [Alimoglu]. Feature extraction and sgementation routines were developed and debugged under the UNIPEN environment [UNIPEN]. The preprocessing and feature extraction tasks were as follows: • Input data was smoothed with a 3-point boxcar filter • Preprocessing for bounding box, velocity (x,y,tangential) and curvature information were obtained by differentiation (using a second-order 5-point Laplacian filter as per UNIPEN [Dooijes]). • Optional first-order resampling • Segmentation into strokes at points of maximum curvature. • Feature extraction for each stroke in each sample. The feature vectors used were length, direction, curviness optionally augmented by delayed (interstroke) direction and delayed distance as per Dolfing [Dolfing]. • Scalar quantisation to 4 levels was performed on each of the 3/4/5 features in the vector, hence the HMM used between 4 and 4 observation symbols. • File export of the output to the HMM for recognition (in hardware this would be performed by block data transfer). The preprocessing task above exhibits a heterogeneous instruction mix, as shown in Section 5.2. A discrete-density Hidden Markov Model (HMM) recognizer was used with 3-9 states and 64-1024 observation symbols. A single model for each digit class was used. Initial HMM accuracy averaged 65% without training. Ensuring convergence of the training is currently under study, but expected results for the trained recognizer would be 96-99.8% character-level recognition. In addition, implementing vector quanitsation is expected to give improvement. 3. PREPROCESSING IMPLEMENTATION Implementation parameters are dictated by the application. In this case, the sampling rate of the pen input coordinates is typically 100Hz at resolutions of 200 points/inch, with data values are fully representable by wordlengths of 12-18 bits [1]. One algorithm, which offers an unrivalled range of elementary function capabilities, is the CORDIC (COordinate Rotation DIgital Computer) iterative algorithm [7].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Handwritten Gurmukhi Character Recognition

Computers are greatly influencing the lives of human beings and their usage is increasing at a tremendous rate. The ease with which we can exchange information between user and computer is of immense importance today because input devices such as keyboard and mouse have limitations vis-à-vis input through natural handwriting. We can use the online handwriting recognition process for a quick and...

متن کامل

Normalizing and restoring on-line handwriting

-Preprocessing and normalization techniques for on-line handwriting analysis are crucial steps that usually compromise the success of recognition algorithms. These steps are often neglected and presented as solved problems, but this is far from the truth. An overview is presented of the principal on-line techniques for handwriting preprocessing and word normalization, covering the major difficu...

متن کامل

A Novel Approach for Online Handwriting Recognition of Tibetan Characters

Abstract—A new method is proposed for online handwriting recognition of Tibetan characters. At first, input pattern is preprocessing. Then, extracting direction feature matrix and edge feature matrix of Tibetan character respectively, they are together formed original feature matrix. It is compressed into final feature matrix with IMLDA (image matrix liner discriminate analysis) technique. Fina...

متن کامل

Off-line Handwriting Recognition by Recurrent Error Propagation Networks

Recent years have seen an upsurge of interest in computer handwriting recognition as a means of making computers accessible to a wider range of people. A complete system for off-line, automatic recognition of handwriting is described, which takes word images scanned from a handwritten page and produces word-level output. Normalisation and preprocessing methods are described and details of the r...

متن کامل

Evaluation Approach of Arabic Character Recognition

This paper proposes and contributes towards designing a complete system for off-line Arabic character recognition. The proposed system is specifically meant for Arabic handwriting recognition, but it equally works for the typed character recognition. It has various phases including preprocessing and segmentation. It also includes thinning phase and finds vertical and horizontal projection profi...

متن کامل

An Empirical Evaluation of Off-line Arabic Handwriting And Printed Characters Recognition System

Handwriting recognition is a challenging task for many real-world applications such as document authentication, form processing, historical documents. This paper focuses on the comparative study on off-line handwriting recognition system and Printed Characters by taking Arabic handwriting. The off-line Handwriting Recognition methods for Arabic words which being often used among then across the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999